28 research outputs found

    Bi-Modal Person Recognition on a Mobile Phone: using mobile phone data

    Get PDF
    This paper presents a novel fully automatic bi-modal, face and speaker, recognition system which runs in real-time on a mobile phone. The implemented system runs in real-time on a Nokia N900 and demonstrates the feasibility of performing both automatic face and speaker recognition on a mobile phone. We evaluate this recognition system on a novel publicly-available mobile phone database and provide a well defined evaluation protocol. This database was captured almost exclusively using mobile phones and aims to improve research into deploying biometric techniques to mobile devices. We show, on this mobile phone database, that face and speaker recognition can be performed in a mobile environment and using score fusion can improve the performance by more than 25% in terms of error rates

    Automatically Derived Units in the Speech Processing

    No full text
    Current systems for recognition, synthesis, very low bit-rate (VLBR) coding and text-independent speaker verification rely on sub-word units determined using phonetic knowledge. This paper presents an alternative to this approach - determination of speech units using AUSP (Automatic Language Independent Speech Processing) tools. Experimental results for speaker-dependent VLBR coding are reported on two databases: average rate of 120 bps for unit encoding was achieved. In verification, this approach was tested during 1998's NIST-NSA evaluation campaign with a MLP-based scoring system

    Audio Surveillance through Known Event Classification

    No full text
    The way of audio surveillance through known event classification is presented introducing simple yet efficient framework. The use of the proposed system for unknown event detection is also suggested and evaluated. Further, a specific audio event is detected with use of audio classification, which helps the detection to focus on a signal of specific behavior. Thus it is shown that the system can be used in several applications

    Speech spectrum representation and coding using multigrams with distance

    No full text
    International audienc

    Diphone-like units without phonemes - option for very low bit rate speech coding

    No full text
    International audienc

    Regularized subspace n-gram model for phonotactic iVector extraction

    No full text
    Phonotactic language identification (LID) by means of n-gram statistics and discriminative classifiers is a popular approach for the LID problem. Low-dimensional representation of the n-gram statistics leads to the use of more diverse and efficient machine learning techniques in the LID. Recently, we proposed phototactic iVector as a low-dimensional representation of the n-gram statistics. In this work, an enhanced modeling of the n-gram probabilities along with regularized parameter estimation is proposed. The proposed model consistently improves the LID system performance over all conditions up to 15% relative to the previous state of the art system. The new model also alleviates memory requirement of the iVector extraction and helps to speed up subspace training. Results are presented in terms of Cavg over NIST LRE2009 evaluation set

    SpeechDat-E: Five Eastern European Speech Databases for Voice-Operated Teleservices Completed

    Get PDF
    Contains fulltext : 76438.pdf (author's version ) (Open Access)4 p

    Regularized subspace n-gram model for phonotactic iVector extraction

    No full text
    Phonotactic language identification (LID) by means of n-gram statistics and discriminative classifiers is a popular approach for the LID problem. Low-dimensional representation of the n-gram statistics leads to the use of more diverse and efficient machine learning techniques in the LID. Recently, we proposed phototactic iVector as a low-dimensional representation of the n-gram statistics. In this work, an enhanced modeling of the n-gram probabilities along with regularized parameter estimation is proposed. The proposed model consistently improves the LID system performance over all conditions up to 15% relative to the previous state of the art system. The new model also alleviates memory requirement of the iVector extraction and helps to speed up subspace training. Results are presented in terms of Cavg over NIST LRE2009 evaluation set
    corecore